Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

The distribution of the numbers of the bimodal genes identified by the bimodal

n ten runs for the letrozole drug data.

mulated data analysis

was composed of 1,000 simulated genes with 40 replicates.

hese simulated genes, 100 were designed as bimodal genes. In

20 genes (ten unimodal and ten bimodal) were composed of

m two outliers. Table 6.9 shows four confusion matrices for the

ihood ratio) test, the GM (gap maximisation) test and the BI

index) test. Figure 6.41 shows the ROC curves for three test

GM, LR and BI). The GM and the BI tests worked well compared

LR test.

The confusion matrices of the LR, BI and GM tests run on the simulated data

the gene expression bimodality pattern discovery simulation. UN stands for

BM stands for bimodal. MCC stands for the Mathews correlation coefficient

1977].

898

900

100

MCC

0.9836

0.2120

0.9377

e 6.42 shows the logarithm gene expression profile of a designed

gene. The p value of the GM test was 0.0065. However, the LR

d to identify it. The bimodal index test also failed to identify this

gene. Its bimodal index test value was 0.69, but the threshold

separating the bimodal genes from the unimodal genes for this

mulated data set was 1.84.